fast unlock in contention#461
Conversation
During contention, almost all threads are active on CPU, unlock them fast can make those threads make progress more quickly. This help improve global throughput in high contention a lot. One shortcoming is that fair unlock is now required be invoked explicitly. This is an improvement to Amanieu#418. Signed-off-by: Jay <BusyJay@users.noreply.github.com>
|
Running Running with 9 threads
Running with 18 threads
Running with 27 threads
Running with 36 threads
|
|
Running parking_lot::RwLock (this pr) - [write] 1102.323 kHz [read] 2943.833 kHz |
This reverts commit d43aee1. Signed-off-by: Jay <BusyJay@users.noreply.github.com>
Signed-off-by: Jay <BusyJay@users.noreply.github.com>
|
Reimplement the PR by maintaining parked bit on waker side, new implementation is less error-prone and work with CondVar directly. Benchmark shows even more positive results: Running Running with 9 threads
Running with 18 threads
Running with 27 threads
Running with 36 threads
Running parking_lot::RwLock (this pr) - [write] 6121.347 kHz [read] 968.373 kHz |
| { | ||
| let mut prev = self.state.load(Ordering::Relaxed); | ||
| let new_state = prev & !LOCKED_BIT; | ||
| prev = self.state.swap(new_state, Ordering::Release); |
There was a problem hiding this comment.
There's a bug here: you may "forget" a parked thread if another thread sets PARKED_BIT between the load and swap.
There was a problem hiding this comment.
Then prev must be set to PARKED_BIT | LOCKED_BIT at L104 and can't pass the check at L105.
|
Bench with the command in #418 std::sync::Mutex avg 30.795793ms min 28.369313ms max 33.668656ms std::sync::Mutex avg 30.52266ms min 28.69828ms max 34.945486ms |
This is an alternative implementation of idea Amanieu#461. Compared to Amanieu#461, this PR maintains parked bit on waiter side, so that waker doesn't have to atomic operation twice. And waker now reset all lock states back to 0 no matter what state it was. This makes fast lock more likely succeed during high contention. Signed-off-by: Jay <BusyJay@users.noreply.github.com>
This is an alternative implementation of idea Amanieu#461. Compared to Amanieu#461, this PR maintains parked bit on waiter side, so that waker doesn't have to atomic operation twice. And waker now reset all lock states back to 0 no matter what state it was. This makes fast lock more likely succeed during high contention. Signed-off-by: Jay <BusyJay@users.noreply.github.com>
This is an alternative more aggressive implementation of idea Amanieu#461. Compared to Amanieu#461, this PR - maintains parked bit on waiter side, so that waker doesn't have to atomic operation twice. - reset all lock states back to 0 when unlock. This makes fast lock more likely succeed during high contention. - set PARKED_BIT even waiter is prevented from sleep, so that more threads can be woken up during contention to compete for progress. Signed-off-by: Jay <BusyJay@users.noreply.github.com>
During contention, almost all threads are active on CPU, unlock them fast can make those threads make progress more quickly. This help improve global throughput in high contention a lot.
One shortcoming is that fair unlock is now required be invoked explicitly.
This is an improvement to #418.